Unsupervised Prediction of Acceptability Judgements
نویسندگان
چکیده
In this paper we present the task of unsupervised prediction of speakers’ acceptability judgements. We use a test set generated from the British National Corpus (BNC) containing both grammatical sentences and sentences containing a variety of syntactic infelicities introduced by round trip machine translation. This set was annotated for acceptability judgements through crowd sourcing. We trained a variety of unsupervised language models on the original BNC, and tested them to see the extent to which they could predict mean speakers’ judgements on the test set. To map probability to acceptability, we experimented with several normalisation functions to neutralise the effects of sentence length and word frequencies. We found encouraging results with the unsupervised models predicting acceptability across two different datasets. Our methodology is highly portable to other domains and languages, and the approach has potential implications for the representation and the acquisition of linguistic knowledge.
منابع مشابه
Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge
The question of whether humans represent grammatical knowledge as a binary condition on membership in a set of well-formed sentences, or as a probabilistic property has been the subject of debate among linguists, psychologists, and cognitive scientists for many decades. Acceptability judgments present a serious problem for both classical binary and probabilistic theories of grammaticality. Thes...
متن کاملAcceptability Prediction by Means of Grammaticality Quantification
We propose in this paper a method for quantifying sentence grammaticality. The approach based on Property Grammars, a constraint-based syntactic formalism, makes it possible to evaluate a grammaticality index for any kind of sentence, including ill-formed ones. We compare on a sample of sentences the grammaticality indices obtained from PG formalism and the acceptability judgements measured by ...
متن کاملMust diphone synthesis be so unnatural?
An English utterance was synthesized in four versions using sets of diphones produced under four different prosodic and contextual conditions. The synthesis used either accented diphones only or appropriately located accented and unaccented diphones, with each of these conditions being repeated using neutral-context and differentiated-context diphones. They were presented to two listener groups...
متن کاملAlert correlation and prediction using data mining and HMM
Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015